Since KNN is such a simple algorithm, we will just use this "Project" as a simple exercise to test your understanding of the implementation of KNN. By now you should feel comfortable implementing a machine learning algorithm in R, as long as you know what library to use for it.
So for this project, just follow along with the bolded instructions. It should be very simple, so at the end you'll have an additional optional "bonus" project.
We'll use the famous iris data set for this project. It's a small data set with flower features that can be used to attempt to predict the species of an iris flower.
Use the ISLR libary to get the iris data set. Check the head of the iris Data Frame.
In this case, the iris data set has all its features in the same order of magnitude, but its good practice (especially with KNN) to standardize features in your data. Lets go ahead and do this even though its not necessary for this data!
Use scale() to standardize the feature columns of the iris dataset. Set this standardized version of the data as a new variable.
Check that the scaling worked by checking the variance of one of the new columns.
Join the standardized data with the response/target/label column (the column with the species names.
Use the caTools library to split your standardized data into train and test sets. Use a 70/30 split.
Call the class library.
Use the knn function to predict Species of the test set. Use k=1
What was your misclassification rate?
Although our data is quite small for us to really get a feel for choosing a good K value, let's practice.
Create a plot of the error (misclassification) rate for k values ranging from 1 to 10.
You should have noticed that the error drops to its lowest for k values between 2-6. Then it begins to jump back up again, this is due to how small the data set it. At k=10 you begin to approach setting k=10% of the data, which is quite large.
You should feel pretty comfortable using KNN since its so simple. As an optional assignment, choose a data set from the UCI Machine Learning Repository and see if you can use the process laid out above to do your own classification!